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ABSTRACT 

The oral interview may be viewed as a 
criterion-referenced test for making either/or decisions about 
functional use of spoken language. Speech production can be tested by 
either the oral interview or the Valdis (1972) "Performance 
Objectives for Speaking," and dialogue between the two systems can be 
profitable. Current literature on criterion-referenced testing and 
performance objectives suggests that the major problem in previous 
speaking tests lay in not specifying the test's parameters. A book by 
Vallette and Disick and the Defense Language Institute's Hiandbook 
both suggest specifying what the task is designed to show, the nature 
of the task, how the task shall be tested, conditions under which the 
test will be taken, and criteria used to determine performance. The 
U.S. Government regularly conducts language proficiency tests by 
means of oral interviews. The Civil Service Proficiency Definitions 
rank ability in five levels from elementary to native or bilingual 
Proficiency. At the CIA Language Learning Center, . additional 
guidelines for assigning proficiency levels and language grammar 
grids are also used. Guidelines cover speaking ability in subject 
matter and quality, aa well as understanding. Oral interview tests 
are conducted to determine if a candidate communicates well enough in 
the target language to perform his job abroad, and how his 
performance compares with that of an educated native speaker. 
(CHK) 
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"The' ORAL INTERVIEW - 
A CRITERION-REFERENCED TEST?"1 

PARDEE LOWE, JR. 
LANGUAGE LEARNING CENTER, CIA 

How do we test speaking? The honest answer is that. 

most o£ us don't. To understand why, return with me to 

the thrilling days of yesteryear, the heyday o£ the 

language laboratory. Johannes Schmidt, Ph.D., ACTFL, AATG, 

sits before the master console . In front of each student 

are the instructions: 

"Record a five minute segment of speech 
on any topic covered in class or of your 
own choosing. Your performance will be 
graded on originality, accuracy of content 
and ability to express your thoughts in 
the target language." 

Half an hour later, surrounded by several piles of 

tapes, Johannes begins to assess the performances. Soon 

he is asking himself, "How did I ever get into this mess?" 

"What standards should I use?" "Didn't I say grammar 

would count?" ("No, you didn't, Mr. Schmidt!") Five 

hours later, Schmidt has heard all twenty students' tapes 

at least once. He has a migrane and flees home to four 

aspirin and bed with the electric ^^blanket turned up to 

nine. Yet, in the middle* of the night, he awakes with a 
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start I The comforting thought descends upon him - ETS 
has just produced a listening comprehension tape. "I'll 
grade my students on their understanding o£ that. 
Comprehension is part o£ speaking, isn't it?" 

*'No, Mr. Schmidt, it is not I" The opposite is truer: 
when someone's control o£ the target language is tested by 
speaking to him in it, the extent to which the student 
understands the question affects his ability to answer it 
in any reasonable fashion. For that reason, some government 
agencies do not give a separate understanding score on the 
oral interview; the assumption being that understanding must 
be at least the equivalent of speaking . Still, v/hat happens 
to the candidate whose special job is to monitor radio 
broadcasts in a language in order to summarize the content 
of the news in English and thus never speaks a word of the 
target language? In such instances, it is possible to have 
4~level understanding, but 0+ level speaking! - which only 
proves, Mr. Schmidt, that performance on a listening compre- 
hension tape does not e 1 performance in speaking. 

Yet, assuming that performance in listening comprehension 
indicates ability in speech production was precisely the 
next step most of the rest of the profession took, too. 
For several years, I taught an intensive course in spoken 
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German at a major university. Teaching assistants gave 
weekly grades on students' speaking performance in class. 
But it always disturbed me that there was never any compo- 
nent in either the midterms or final examination which asked 
the candidate to produce a ratable sample of speech. 

I believe the climate is changing . With the advent of 
performance objectives, individualization of instruction, 
and criterion-referenced testing, I believe that we can 
construct meaningful speaking tests. If testing is conducted 
on the scale that Johannes Schmidt did, we are still going 
to have headaches, however. But the seeds of a better day 
were present even in Schmidt's ill-fated attempt. 

Why w?is it doomed? The current literature on criterion- 
referenced testing and performance objectives suggests that, 
the major problem lay in not specifying the test's para- 
mettjrs . Vallette and Disick (1972)^ and the Defense 
Language Institute's Handbook (1975) ^ both suggest specifying 
what the task is designed to show, the nature of the task, 
how the task shall be tested, the conditions under which the 
test will be taken, and what percentage of mastery should 
be required of the student. Using the Valdis (1972) four- 
fold approach, Schmidt would now be in a position to re- 
formulate his original instructions : 
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PURPOSE : "The following task is designed 
to test your ability to handle a realistic, 
unknown situation in the target language. 

STUDENT BEHAVIOR : In a period o£ not .more 
than three minutes, explain in a coherent 
narrative how you would get information on 
how to recover a lost suitcase at the 
Frankfurt airport. 

CONDITIONS : Record your explanation in 
German on the cassette provided. No notes 
will be allowed. 

CRITERION : You will be graded on how naturally 
you do this task; pronunciation, fluency, 
grammar and suitability of vocabulary to the 
task. A passing grade is performance in which 
65 percent or better of the whole explanation 
is free from grammar, vocabulary and pronuncia- 
tion errors." 

Now, there is a ray of hope. The instructor has^ 
specified the task to a point where both he and the 
students have a fuller understanding of it. Also, because 
every student is assigned the same task, 'the instructor 
can compare performances. If his standards fail him (and 
they may because they still must^be worked out in detail), 
he can at least place the performances in a series with 
the best performance at one end and the worst at the 
other. He can then select a cut-off point and separate 
the sheep from the goats. 

Familiarity with Valdis (1972: 152-3) will allow him 
to apply their ''External Standards for Speaking". The 
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standards given are particularly useful for testing micro- . 
segments o£ learning at lower ends o£ the language learning 
spectrum. What happens, however, when an instructor wishes 
to test a student's overall ability in the target language 
for some macrosegment of learning; at the en.d of the high 
school course, at the end of the basic college course, at 
the end of an undergraduate major, or at the end of graduate 
school. 

A few years ago, it was relatively unknown that the 
United States Government regularly conducted such testing 
by means of an oral interview. Recently, several articles 
and books have referred to the Government test: Clark (1972); 
Jones (1975) and Jones (To Appear); Weinstein (1975); Wilds 
(1975) and' the DLI Handbook (1975). The oral interview 
started in the 1950's while criterion-referenced testing 
appears about a decade later. Although the Proficiency 
Definitions for the oral interview have been revised several 
times, I have been unable so far to establish any cross- 
fertilization. It appears to be an example of polygenesis. 

The Government required a test to determine how Foreign 
Service Officers would perform their jobs abroad using the 
target language. Thus in the mid 1950's, the Proficiency 
Definitions and the oral interview were devised at the Foreign 
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Service Institute. This type o£ testing is now used not only 
in the Foreign Service Institute, but in the Central Intelli- 
gence Agency, the Defense Language Institute, the Peace 
Corps and for SHAPE , In the mid 1960 's the definitions were 
accepted by the Civil Service Commission for the whole 
Government . 
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TABLE I 

CIVIL SERVICE PROFICIENCY DEFINITIONS 



Elementary Proficiency 

S-1 Able to satisfy routine travel needs and minimum courtesy 
requirements. Can ask and ansv/er questions on topics very 
familiar to him; within the scope of his very limited 
language experience can understand simple quest ions " and 
statements, allowing for slowed speech, repetition, or 
paraphrase; speaking vocabulary inadequate to express 
anything but the most elementary needs; errors in pro- 
nunciation and grammar are frequent, but can be under- 
stood by a native speaker used to dealing- with foreigners 
attempting to speak his language; while topics which are 
"very familiar'^ and elementary needs vary considerably 
from individual to individual, any person at the S-1 
level should be able to order a simple meal, ask. for 
shelter **or lodging, ask and give simple directions, make 
purchases, and tell time. 

Limited Working Proficiency 

S-2 Able to satisfy routine social demands and limited work 
requirements. Can handle with confidence but not with 
facility most social situations, including introductions 
and casual conversations about current events as well as 
work, family, and autobiographical information; can 
handle limited work requirements , needing help in handling 
any complications or difficulties; can get the gist of 
most conversations on non- technical subjects (i.e., topics 
which require no specialized knowledge) and has a speaking 
vocabulary sufficient to express himself simply with some 
■circumlocutions; accent, though often quite faulty, is 
intelligible; can usually handle elementary constructions 
quite accurately but does not have thorough or confident 
control of the grammar. 

Minimal Professional Proficiency 

S-3 Able to speak the language with sufficient structural 

accuracy and vocabulary to participate in most formal and 
informal conversations on practical, social, and pro- 
fessional topics. Can discuss particular interests and 
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special fields of competence with reasonable ease; compi'^e- 
hension is quite complete for a normal rate of speech; 
vocabulary is broad enough that he rarely has to grope 
for a word; accent may be obviously foreign; control of 
grammar good; errors never interfere with understanding 
and rarely disturb the native speaker. 

Ful l Professional Proficiency 

S-4 Able to use the language fluently and accurately on all 
levels normally pertinent to professional needs. Can 
understand and participate in any conversation within 
the range of his experience with a high degree of 
fluency and precision of vocabulary; would rarely be 
taken for a native speaker , but can respond appropriately 
even in unfamiliar situations; errors of pronunciation 
and grammar quite rare; can handle informal interpreting 
from and into the language. 

Native or Bilingual Proficiency 

S-5 Speaking proficiency equivalent to that of an educated 
native speaker. Has complete fluency in the language 
such that his speech on all levels is fully accepted by 
educated native speakers in all of its features, including 
breadth of vocabulary and idiom, colloquialisms, and 
pertinent cultural refer ences . 
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Althougli not employing the same language as the Valdis 
(1972) Performance Objectives, the definitions describe tasks 
in a similar fashion. Because the most desirable level in 
government work is Level 3 (often designated "minimal pro- 
fessional proficiency), I direct your attention particularly 
to its wording in Table I. 



TABLE I: PROFICIENCY DEFINITIONS 



At the Language Learning Center, much of the material for 
performance objectives not contained in the Proficiency 
Definitions themselves are provided by two other documents : 
"Some Guidelines for Assigning Language Proficiency Levels*^ 

(see Table II) and a Grammar Grid for each language. The 

f 

Proficiency Definitions and the Guidelines For Rating apply 
to all languages while Grammar Grids are language- specif ic in 
those languages where they are available. These documents 
attempt to characterize both in general terms and in specific 
grammatical terms for a given language the domains of behavior 
and tasks at each level. Finally, these documents suggest 
how representative outcomes should be rated. 

TABLE II: GUIDELINES 
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The system works as follows: a government employee claims 
to speak French. Slated for a language- des ignated position, he 
is scheduled for a test. At the time of the interview, he is 
introduced to the . two testers, and they proceed to have a chat! 
Yet, the oral test is a probing "conversational interview'*, in 
which the cahdidate's control of general language is put to 
the test by a series of standardized elicitation techniques. 
While not a direct tesr in the sense that a special section 
tests vocabulary, another grammar, etc. by its end (10-30 
minutes), the testers can assign the candidate a rating on the 
scale of 0 (for no practical ability to communicate in the 
language) to 5 (for performance like that of an educated native 
speaker). The scale provides plusses for unusually strong 
performance at a given level so that the resultant scale fur- 
nishes 11 distinctions. All tests are taped for verification 
by a third rater. 

The test is conducted with one basic question in mind: 

"Can the candidate communicate well enough in the 
target language to perform his job abroad?" 

His performance is rated on: 

"How close does the performance come to that of 
an educated native speaker in the same topic(s) 
and/or situation(s) ?" 

In the oral interview, candidates are not compared to 

a set of norms nor are they assigned scores comparing them- 
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selves to one another in the norm-referenced sense with one 
receiving 95, the next 92, etc. Like criterion-referenced 
testing, the system is used for making either/or decisions: 
take job/don't take job; continue training/discontinue 
training; place in class which has already started/do not 
place in such a class. These are the kinds of decisions 
which the government has been making for over 20 years based 
on the oral interview and doing with a high degree of success 

Yet, the government test differs in two important 
respects from the other kinds of criterion-referenced testing 
described in most of the literature: 

First, the oral interview is a proficiency test, not an 
achievement test. The question is "not h~6w jnuch did the 
candidate learn of Chapter 4 of his Spanish text , rather 
how well does the candidate speak the target language com- 
pared to the performance of an ^'educated native speaker''? 
''Educated'* does not mean aesthetic- literary education, but 
acquaintance with how his language works through schooling 
equivalent to lycee , liceo. Gymnasium etc. in Europe or a 
four-year liberal arts education in the USA. 

Second, the oral interview normally tests candidates at 
a. higher level than the performance objectives given in 
Valdis (1972) and their attendant examples. 
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A comparison o£ the two systems reveals that the oral 
interview falls mainly in Stage 4 o£ the Valdis (1972) 
system. Its position there is based on the major criterion 
of the proficiency definitions: *'How ivell does the candidate 
communicate in the language?" The ability to communicate^ 
plays a central role even at the loivest levels . A stage 
by stage comparison shows that Valdis (1972), Stages 1 § 2: 
Mechanical Skills ^ Knowledge respectively," dominate at the', 
lowest Levels 0"*" and 1 of the government's Proficiency 
Definitions; Valdis (1972) Stage 3: Transfer is an integral 
part of Level 1; while Valdis (1972) Stage 4: Communication 
as defined by their examples, is an essential part of Levels 
l'*"-3 with Valdis (1972) Stage 5: Criticism part and parcel 
of Levels 4 § 5 in the Proficiency Definitions. 

In all f airnes,s. .to the Valdis (1972) system, my under- 
standing of it derives: less from the wording of their per- 
formance objectives, although such statements are infinitely 
more informative than the vague verbiage of yore, than from 
their examples. Frankly, there are similar problems in 
interpreting the government's Proficiency Definitions. Thus, 
government testers must be trained in how to use the system, 
what. the standards mean, and how to elicit a ratable sample 
of speech from the candidate. After training, the standards 
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are best preserved i£ the testers test frequently (in our 
case, one or more times a week), and if when they are in 
doubt about a given test, they return to check the defini- ' 
tions and listen to sample calibrated tapes as examples 
of the levels they may be dealing with. 

The rating contains. a subjective element. This is 
partially controllable by the measures cited above and by 
the fact that raters at the Language Learning Center test 
in pairs and rate individually on the overall impression 
they have of the candidate's speech in the target language. 
The Proficiency Definitions furnish the basic parameters 
at each level. But few tests are paradigmatic examples of 
a given level. The challenge in testing lies in balancing 
the various possible combinations , for example, bad pronuncir 
ation with good grammar and limited vocabulary. Basically, 
the definitions are functional. A grammar grid may specify 
some control of subjunctive in German at Level 2**", but a 
candidate may deal with 2**" level topics with adequate grammar 
and never use a subjunctive. Assuming that all other structures, 
pronunciation, fluency and vocabulary speak for his being 
rated at Level 2***, he will receive that rating. It is likely 
to be a less strong 2**" than, with the subjunctive, but It 
will be a 2^*". As Jones (To Appear) has rightly pointed out. 
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all of the subjective elements in testing cannot be eradicated. 
However, our task is to minimize them. 

Space scarcely suffices to mention the basic issues, let 
alone tackle the substance of two different systems. Still, I 
hope to have shown the main ivays in which the oral interview 
may be viewed as a criterion-referenced test with a major cri- 
terion against which all performance can be judged, non-norm- 
referenced rating, and either/or decisions about functional 
use of spoken language. Further, I hope to have shown that it 
is possible to test speech production by either the oral 
interview or the Valdis (1972) "Performance Objactives For 
Speaking" and, finally, that there can be a profitable dialogue 
between the two systems. 

We left poor Johannes Schmidt huddling in his bed, 
totally dependent on a listening comprehension tape for 
testing speech production. It is time to tell him 

"Hel Johannes, schlaf nichtl" 
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FOOTNOTES TO ''THE ORAL INTERVIEW" 



1 

The following paper was delivered in the Section on 
"Foreign Language Testing: A Time for New Directions" at the 
1975 FIPLV/AATG/ACTFL Convention in Washington, D.C . in 
November 1975. It has been revised slightly in the 
interim. 

■ 2 

Hereafter referred to as Valdis (1975) and the DLI Handbook 
(1975) respectively. 

3 

Similarly, Valdis (1972) begins with a taxonomy delimiting 
the field: then, specifies speaking in terms of Performance 
Objectives at the Five Stages; and finally, provides 
illustrative examples. 

4 

Two levels in the Proficiency Definitions make explicit 
that communication in this sense is strongly tempered by 
grammatical accuracy : Level 5 demands the grammatical accuracy 
of a high-level diplomat while Level 3 requires consistent 
accuracy with a few errors permitted in the "core grammar" 
of the target language and a larger number of errors in 
less frequent structures if they are used at all. Testers 
generally alloiv for the primacy of grammar by weighting 
grammar more heavily than vocabulary and both. of the above 
more than either pronunciation or fluency (see Wilds 1975: 
32) . . 
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